A study of text-theoretical approach to S-box construction with image encryption applications

Data protection is regarded as one of the biggest issues facing companies that have been using public data for a long time. Numerous encryption techniques have been used to address these issues and safeguard data from any malicious attempts and assaults. A substitution box (S-box) is the basic component for modern block ciphers, which helps to ensure robust security of plain data while encryption and permits its lawful decipherment. The goal of this paper is to recommend an effective, original, and straightforward technique for the creation of robust S-boxes. A sample S-box is generated in the proposed work using the word “UNITY” but other words can also be used to generate many powerful S-boxes. The ASCII code is used to translate the word “UNITY” into binary form, after which a distinct matrix is constructed for each character of the word. In the next phase, a linear fractional transformation is constructed using these matrices, which is then utilized to generate the S-box. The constructed S-box was then evaluated against typical security standards to support its high cryptographic authority. The generated S-box's statistical and algebraic resilience is demonstrated by its very low linear probability and differential probability scores of 0.125 and 0.039, respectively, and a high non-linearity score of 111.5. To evaluate the effectiveness of the image encryption scheme, digital images are encoded using the created S-box. The performance and comparative research demonstrate that the suggested S-box is a real candidate for applications in the field of image encryption and has a stronger performance base.

Cryptography is a foundational technology that plays a pivotal role in safeguarding information and ensuring trust in digital interactions.Its importance extends across various domains, from securing online transactions to protecting national security and individual privacy.As the digital landscape continues to evolve, cryptography remains an essential tool for addressing the growing threats and challenges of the digital age.Many algorithms for protecting sensitive information are known in the literature.Ciphers used for plaintext encryption and cipher text decoding are classified into two types: block ciphers and stream ciphers 1,2 .The block cipher encrypts and decrypts data block by block.The data block is typically made up of one or more bytes.On the other hand, a stream cipher accomplishes these modifications by utilizing one bit or one byte in a single step.A block cipher has become the most potent approach for protecting sensitive data in contemporary cryptography 3 .Block ciphers are simpler to develop and more often used in practical security applications than stream ciphers 4 .
The substitution-permutation block ciphers are one of the most common types of block ciphers used to protect data.Permutation and substitution are important processes that are utilized in these ciphers to help change data into a mysterious format.The permutation function exchanges bits or bytes from the plaintext in the form of additional bits or bytes that are also included in the plaintext.In contrast, a nonlinear substitution process swaps out one block of data for another.S-box is used to replace the data in this instance 5,6 .A crucial component of modern block ciphers, an S-box considerably aids in the production of garbled output data (cipher text) from the provided input data (plaintext).To make things more difficult for the attackers, the S-box should provide a non-linear relationship between the input and output of the data 7 .If a substitution box may cause greater uncertainty for the attackers, the block cipher is regarded as more secure.On the other hand, a weak S-box can undermine the security of a block cipher.A weak S-box has low non-linearity, meaning it does not effectively obscure the relationship between input and output bits.In other words, it doesn't introduce sufficient complexity in the substitution operation, making it vulnerable to various cryptanalysis techniques.The avalanche effect is a desirable property in cryptography where a small change in the input should result in a significantly different output.Weak S-boxes often fail to exhibit a strong avalanche effect, meaning small input changes may lead to small or predictable output changes, which can be exploited by attackers.Weak S-boxes may exhibit linear or affine structures, making it easier for cryptanalysts to derive mathematical relationships between the input and output.This can lead to efficient attacks, such as linear and differential cryptanalysis.The security provided by a block cipher using the substitution box depends entirely upon the effectiveness of the corresponding S-box 8,9 .
As an outcome, researchers eventually developed novel methods for building key-dependent, dynamic S-boxes.Among these algebraic ideas, linear fractional transformation (LFT) is an essential method for constructing robust S-boxes.The LFT approach was employed by the authors in [10][11][12][13] to construct efficient and reliable S-boxes.When measured against the accepted cryptographic standards that are used to assess an S-box, the created S-boxes showed respectable results.In addition to LFT approaches, researchers have also suggested effective methods for producing S-boxes.For example, authors in 14 offered a method that made use of the concept of cubic fractional transformation (CFT) to create effective S-boxes.Another straightforward method has been proposed in 15 for constructing S-boxes by using cubic polynomial transformation.For constructing strong S-boxes, the use of this approach is incredibly straightforward, effective, and efficient.In 16 , a novel method for building robust S-boxes with linear transformation was put forward.Using the permutation process, this method strengthens the resulting S-box.DNA computing is also employed by authors to construct resilient S-boxes and cryptographically robust ciphers using the techniques provided in [17][18][19][20][21][22] .These S-boxes and ciphers were shown to be strong in terms of cryptographic standards and resistance to various assaults through analysis.Due to the unpredictability of chaotic systems, chaos is also an important field exploited in cryptography for creating reliable S-boxes 20 .Chaos theory was utilized by several scholars [23][24][25][26] to construct S-boxes with strong cryptographic properties.Hyperchaotic schemes were employed by the authors of 27,28 to build robust and reliable S-boxes.Many other researchers have also developed S-boxes using a variety of different methods, including cellular automata 29 , graph theory 30 , elliptic curve 31 , and optimization approaches 32 , etc.
The construction of S-boxes relies on certain key assumptions to ensure their security and effectiveness.Here are some key assumptions for the construction of S-boxes: • S-boxes should exhibit strong non-linearity, which means that they should not be easily expressible as a linear function of their inputs.This property helps prevent attackers from exploiting algebraic relationships between input and output bits.• S-boxes should satisfy the avalanche effect, where a change in a single input bit should cause approximately half of the output bits to change on average.This property ensures that small changes in the plaintext result in substantial changes in the cipher text.• S-boxes should be designed to resist known cryptanalysis techniques such as differential and linear crypta- nalysis.These techniques attempt to find patterns or correlations between plaintext, cipher text, and key bits, and strong S-boxes can help thwart these attacks.• S-boxes should be bijective, meaning that each input value maps to a unique output value, and vice versa.
This property ensures that the S-box can be inverted, allowing for decryption.
In this manuscript, a text-theoretical approach is used for the construction of effective S-boxes.Text encryption by using the Adjacency matrix of Graphs is used in different branches of the research and is frequently utilized to address a wide range of practical issues.Mathematicians are becoming more and more aware of the importance of graph theory.It is employed in electrical engineering, solid-state physics, organic chemistry, statistical mechanics, operations research, and optimization theory.Numerous other fields, including computer science, mathematics, and engineering, are impacted by the study of cryptography [33][34][35][36][37] .
The significant contributions provided in this study are as follows: • To generate the initial S-boxes, a novel and straightforward Graph-theoretic method is proposed, and sub- sequently its nonlinearity is enhanced by employing a permutation of symmetric group.• Conventional S-box assessment criteria and the S-boxes already accessible in the existing literature are used for analyzing the proposed S-box.The significant contribution of the planned S-box is confirmed by this performance investigation.• The suggested method's sample S-box is used for image encryption.The S-box-based encryption has out- standing encryption effectiveness, according to simulation and effectiveness evaluations.
The subsequent sections of the paper are structured as follows.The construction of S-boxes utilizing graphical techniques is thoroughly covered in section "Algebraic structure of S-box".Section "Algebraic analysis" compares the performance of S-box created using the suggested approach to currently in use modern S-boxes.In section "Image encryption", the suggested S-box is tested and evaluated for use in image encryption.In contrast, section "Conclusion" brings the study's conclusion.

Algebraic structure of S-box
For the purpose of constructing substitution boxes, the suggested method essentially consists of three stages.First, LFT is generated from a phrase using the concepts of graph theory.Second, this LFT generates an S-box, and finally, a symmetric group permutation is used to enhance the non-linearity of the initially generated S-box.The suggested algorithm's initial phase is based on graph theory because text characters are hidden in the graph according to the graph's vertices and their degrees.Every character in a text is first changed to its ASCII code, and then it is turned to binary notation.The order of the matrix is then determined by evaluating XOR for each character of the given text, which determines the order of the matrix for each character individually.The number of rows and columns in a matrix corresponds to the number of nodes in a graph, and we follow the same www.nature.com/scientificreports/procedure to create a graph using a matrix as the definition of the adjacency matrix of a graph.Each step of the proposed approach to constructing S-boxes is given in algorithm 2.1 and the flow chart for the construction of the S-box is given in Fig. 1.

Construction of a specimen S-box
Let's say that we wanted to create an S-box from the word "UNITY." Using the ASCII table, we were able to find the binary format of all of its alphabets, as shown in Table 1.
Use the algorithm 2.1 to create a square symmetric matrix for each letter, as shown in Table 2.   www.nature.com/scientificreports/

Alphabets Adjacency Matrices
We now find the desired LFT as shown below by taking the reciprocal of the sum of all the mappings in the last column.Now enter the values from 0 to 255 into the mapping provided in Eq. ( 2) and to keep all the outputs in the range 0 to 255 replace f (64) with 0 and f (131) with 191.After resolving all of the outputs under mod 257, we find the initial S-box as shown in Table 3.
This S-box's Boolean functions have an average non-linearity value of 99.75, though a resilient S-box could be created by using the Symmetric group permutation to increase this value.The 256 cells of Table 3 are subjected to the following symmetric group permutation in this instance.As a result, we find in Table 4

Algebraic analysis
For effective encryption, an S-box must have robust algebraic properties that enable nonlinear associations between the input and output of data.This section carefully analyzes the created S-box given in Table 4 using the following standards for judging the cryptographic potency of substitution boxes.
As a comparison tool between our suggested S-box and several current S-boxes, we additionally choose recently explored S-box approaches.

Bijective
If a mapping has a unique output against a particular input, it is said to be bijective.This characteristic should be very clearly demonstrated by an S-box design.In the instance of 8 × 8 S-box, every value entered in the interval [0, 255] should result in a unique value for output in the interval [0, 255].This condition is validated by the created S-box in Table 4 since it has all conceivable different output values in the interval [0, 255].Additionally, the number of 1's in each Boolean function is the same as the number of 0's.

Non-linearity
An S-box design is thought to be better if it possesses the capacity to convert the given inputs into outputs nonlinearly.This kind of S-box is useful for thwarting intruders' attempts at linear cryptanalysis.Using Eq. ( 3) as provided by 38 , one may determine the nonlinearity of Boolean functions: Here W f (a) = a∈F n 2 (−1) f (x)⊕a.x) is the value of Walsh Spectrum.Eight balanced Boolean functions that make up our predicted S-box have non-linearity values of 112, 112, 112, 110, 112, 112, 112, and 110.It is clear that the lowest, highest, and mean nonlinearity values for our S-box are, respectively, 110, 112, and 111.75. Figure 2 shows a graphical comparison of our S-box's nonlinearity results with those of newly developed S-box construction methods.The contrast shows that our predicted S-box's nonlinearity value works better than the nonlinearity outcomes of numerous other different existing S-boxes.

Strict Avalanche Criterion (SAC)
This S-box feature guarantees that one input bit shift modifies 50% of the output bits 39 .As a result, a powerful S-box has a SAC score that is roughly equivalent to 0.5.Table 5 provides the dependence matrix of our S-box's SAC readings.The mean SAC score of our S-box is 0.4978, which is very close to 0.5.As a result, our anticipated S-box adequately fulfills the SAC requirement.Table 8's comparison of the predicted S-box's SAC values with those of other S-boxes shows that our S-box's SAC score is gracefully consistent with those of other S-boxes.
( This feature of an S-box guarantees that when just a single input bit is changed, alterations in any two output bits do not rely on one another 39 .The BIC-SAC and BIC nonlinearity (BIC-NL) values of the created S-box are shown in Tables 6 and 7, respectively.BIC-NL and BIC-SAC average values for the constructed S-box are 103.86 and 0.4964, respectively.These results suggest that the predicted S-box fully satisfies the BIC criterion.Moreover, Table 8 provides an investigation of BIC values for various S-boxes.

Linear probability (LP)
The bits of plaintext must be mixed up during the design of a trustworthy block cipher so that cipher text invaders cannot determine the bits' initial sequence.This confusion is made possible by a strong S-box construction that establishes a nonlinear linkage between plaintext bits and cipher text bits.The linear probability described in Eq. ( 4) is used to assess this algebraic feature of an S-box 40 .
Here, t x denotes input mask and t y denotes output mask.If the relationship among the input and output bits is linear, the LP value for the S-box will be greater and attackers can easily perform linear cryptanalysis.The predicted S-box has an LP score of 0.125, which is a low enough number to withstand linear cryptanalysis.Therefore, the suggested S-box has a good chance of defeating such cryptanalytic attempts.Table 8 compares the LP values of a few existing S-boxes and predicted S-box.Comparing the proposed S-box to other existing S-boxes, it is clear that it has excellent durability.www.nature.com/scientificreports/

Differential uniformity (DU)
The attackers can determine the full or incomplete plaintext or secret key by differential attacks 41 .Analysts assess the differential uniformity (DU) of S-boxes to determine this disparity.An S-box's DU should be relatively small to withstand differential cryptanalysis.Table 9 provides the DU values of the constructed S-box, which were determined using Eq. ( 3).The highest possible DU score for the predicted S-box is 10.As a result, the differential probability (DP) value equates to 0.039, indicating that the predicted S-box provides respectable resistance to differential cryptanalysis.The contrast of DP values between the predicted S-box and numerous other S-boxes is shown in Table 8. Figure 3 shows a graphical comparison of our S-box's maximum DU results with those of newly developed S-box construction methods.

Image encryption
S-boxes are an essential component of image encryption algorithms because they provide non-linearity, confusion, and protection against various cryptographic attacks.They help ensure the confidentiality and security of encrypted images by making it extremely difficult for unauthorized parties to reverse-engineer the encryption process or recover the original image without the correct decryption key.Image encryption analysis involves evaluating the security and effectiveness of an image encryption scheme to ensure that it meets the desired security goals.This section carefully analyzes the robustness of the created S-box given in Table 4 in image encryption using several standards for judging the cryptographic potency of substitution boxes.www.nature.com/scientificreports/

Majority Logic Criterion (MLC)
Using various images, the Majority logic criterion performs several statistical analyses to determine the statistical robustness of the S-box in image encryption 58 .The distortion that the encryption process generates in the picture determines how effective the suggested method is.To examine the statistical characteristics, different analyses are required.Correlation, entropy, contrast, homogeneity, and energy are a few of these analyses.JPG images of Cameraman, Pepper, and Baboon taken form the article 65 are used for MLC analysis.The outcome of image encryption using the suggested S-box and histograms is shown in Table 10.Going forward, Tables 11, 12 and 13 contrast the outcomes of this analysis with those of other well-known S-boxes.These findings suggest that the suggested S-box is sufficient to be included in the algorithms created for the safe transfer of information and data and is appropriate for encryption applications.
Proposed [43]   [44] [45] [46] [47] [48] [49] [50] [51] [52]  The Peak Signal-to-Noise Ratio is a metric used to quantify the quality of a reconstructed or compressed signal, typically an image or video, compared to the original, uncompressed signal.PSNR is commonly used in various fields such as image processing, video compression, and signal processing to assess the fidelity of the reconstructed data.The PSNR value is calculated as the ratio of the peak signal power to the noise power, typically expressed in decibels (dB).A higher PSNR value indicates better quality, as it means that the reconstructed signal is closer to the original signal and has less distortion or noise.The formula for calculating PSNR is: where R is the maximum possible pixel value of the image and MSE is the Mean Squared Error.The PSNR results of encrypted images are given in Table 14.where N is the total number of pixels, X i represents the pixel value in the original and Y i represents the corre- sponding pixel value in the reconstructed or compressed image.The MSE results of encrypted images are given in Table 14.

Normalized Pixel Change Rate (NPCR)
Normalized Pixel Change Rate is a metric used to evaluate the quality and security of image encryption or data hiding techniques.It is commonly used in the field of information security and cryptography, particularly when assessing the performance of encryption algorithms applied to images.The result is usually expressed as a percentage.In cryptographic applications, a high NPCR value (close to 100%) is desired because it indicates that a small change in the plaintext image results in a significant change in the encrypted image, making it harder for an attacker to deduce information about the original image from the encrypted version.The formula for calculating NPCR is as follows: where N c is the total number of pixels for encrypted images and N is the total number of pixels in the plain image.The NPCR results of encrypted images are given in Table 14.

Unified Average Changing Intensity (UACI)
Unified Average Changing Intensity is a metric used to evaluate the quality and security of image encryption or data hiding techniques.The UACI metric measures the average change in pixel intensity values for encrypted images, typically generated by applying an encryption algorithm.A lower UACI value is generally considered better, indicating that the encryption algorithm produces smaller changes in pixel values between the two images.The formula for calculating UACI is as follows: where N is the total number of pixels in the images, X i represents the pixel value in the original image, Y i rep- resents the corresponding pixel value in the encrypted image and R is the maximum possible pixel.The UACI results of encrypted images are given in Table 14.

Complexity analysis
This section investigates the computational complexity of the suggested encryption approach, which involves partitioning the image into Most Significant Bits (MSBs) and Least Significant Bits (LSBs).Each pixel in the image is divided into two sub-blocks with a constant time complexity of O(1) .The initial stage necessitates O(M × N) bit operations where (M × N) is the dimension of the plain image.The substitution module operates in linear time as well.Because all of the algorithm's modules run in linear time, the scheme's overall computational complexity is O(M × N) .In comparison to previous algorithms 64 , the computational time complexity for creating the cross-coupled chaotic sequence for one round operation is O(2 × M x ) , where M x is the maximum value of M 1 and M 2 .The row-column permutation stage has a temporal complexity of O(M 1 × N 1 ) , while the row-column diffusion operation has a computational complexity of 8(M 1 + N 1 ) .As a result, the overall total computational time complexity of the encryption technique 64 is O(2M x + 9(M 1 + N 1 ) , which is nearly equivalent to the sug- gested algorithm.

Correlation
Correlation analysis in image encryption often involves examining the correlation between different pixel pairs in both the original and encrypted images.This can include analyzing vertical, horizontal, and diagonal correlations.Vertical Correlation calculates the vertical correlation by comparing pixel values vertically in the original and encrypted images, Horizontal Correlation calculates the horizontal correlation by comparing pixel values horizontally in the original and encrypted images, and Diagonal Correlation calculates the diagonal correlation by comparing pixel values diagonally in the original and encrypted images.Each of these correlations provides insights into how well the encryption algorithm disrupts spatial relationships in the image.These correlation results of encrypted images are given in Table 15.

Conclusion
In this work, we presented a novel and simple text-theoretical method for the creation of S-boxes.Also, we constructed a sample S-box using the word "UNITY" as an example.When compared to current S-boxes, this newly designed S-box's algebraic features are strong enough to resist linear and differential attacks, and its performance is superior to that of many other S-boxes.Especially, the mean NL score of the proposed S-box is 111.5, which is extremely high compared to the other S-boxes.The specimen S-box is also utilized in image encryption for images of Cameraman, Pepper, and Baboon.To evaluate the performance of the specimen S-box in image encryption, various statistical analyses such as PSNR, MSE, NPCR, UACI, and MLC are used.In particular, low correlation values of -0.0012, 0.0000, and 0.0006 for vertical, horizontal, and diagonal correlation, respectively, of Pepper image demonstrate the robustness of encryption utilizing the specimen S-box.In future development, this proposed method for S-box construction can be used for dynamic S-box construction using any sentence or paragraph.

Figure 3 .
Figure 3.A graphical comparison between DU values of various S-boxes.

Table 1 .
Binary format for each character of the word "UNITY".

Table 5 .
Strict Avalanche values of created S-box.

Table 8 .
A comparison between algebraic properties of different well-known S-boxes.

Table 9 .
Input/output XOR distribution table of created S-box.

Table 10 .
Original and encrypted images with their histograms.

Table 11 .
Mean Squared Error is a common metric used to measure the average squared difference between the values predicted by a model or estimation technique and the actual values.The MSE test is a mathematical calculation used for evaluating the performance of models or estimators, especially in the context of regression analysis or image processing.In the context of image processing and compression, MSE is often used to quantify the MLC comparison of Cameraman's image using the proposed S-box with various other S-boxes.

Table 12 .
MLC comparison of Baboon image using proposed S-box with various other S-boxes.

Table 13 .
MLC comparison of Pepper image using proposed S-box with various other S-boxes.quality of a reconstructed or compressed image compared to the original image.It provides a numerical value that represents the average squared difference in pixel values between the two images.A lower MSE indicates better image fidelity, as it implies that the reconstructed image is closer to the original.The formula for calculating MSE is as follows:

Table 15 .
Correlation results of encrypted images.